We will never get anywhere along these lines. The reason is simple. The key problem is the term “underwater”, as it’s in the “top-level” logical space. Underwater affects everything in the prompt. Its in the same logical space as everything you could put in the prompt, so the model tries to blend underwater into everything.
Note: When I say “top-level, logical space”, these are terms I made up. These are not some Stable Diffusion Terms.
Insight: Make sure you are thinking about the terms in your prompts at the logical level they are, taking care not to misalign them. We’ll take a look at how to do this successfully in the next section. (View Highlight)
There is a well known saying given to students who are studying art that goes something like:
Draw what you see, not what you think you see.
The same is true for Stable Diffusion Models. In the above example, we think we see we are underwater, and logically, that is true. But the model doesn’t know anything about our logic. It only knows images.
Lets therefore lets think about what we actually see. Well, we see a room, with a glass window with water and fish on the other side. (View Highlight)
Now that’s more like it. We could keep refining this prompt to squeeze more quality out of it, but that is outside the scope of this guide. For now, we just want to make sure we are choosing our words as carefully as we can when it comes to constructing good quality prompts.
Insight: Think about how the model wants to generate images. Think about what the model is trying to blend together. Setup the model to blend attributes that are in the same “logical layer”. Do this by thinking clearly about what the image in your head actually is visually. (View Highlight)
This is because items can be positionally dependent. When Stable Diffusion “digests” your prompt, it does so in chunks. Sometimes, certain things get bumped out of chunks to make room for other terms. However, the very first thing in each chunk will always be processed. This is why you can set the weight to 0.1, and still have it effect the image.
The above paragraph is an oversimplification, but it gets the point across. (View Highlight)
Its tempting to think “well, ill just increase the weight of the red outfit then!”, but this would be a mistake. If you go down this road, you will find that you are adjusting weights endlessly trying to refine your prompt, without realizing that position also matters.
Rather than adjust the weight of “red outfit”, lets move it to just before “1girl”. (View Highlight)
Insight: The position of a term in your prompt carries significant “weight” on its own. This positional weight is sometimes enough to overpower other weights in your prompt. If you find you are having a difficult time trying to get Stable Diffusion to respect your weights, try shifting terms around instead. Therefore you should… (View Highlight)
Focus on Composition first, then add details. This builds off of the previous section, and wont have any images to go with it.
Put the most important things into your prompt first. Then, once you have found that you have everything that is critical to your image, start adding refining details. If you do craft your images in this incremental fashion, you wont run into problems like some random term that you don’t expect, is anchoring the whole image, such as the yellow eyes term in the previous section. In the previous section, it was obvious what the offending term was, but it wont always be so obvious.
Suggestion: Turn off hi-res fix when you are trying to refine your composition. Hi-res fix wont “fix” your image, unless you’re using really high base resolutions. Keep your resolution to 512x768 and just slam out images looking for that perfect composition.
Insight: Build your prompts with composition in mind first. Once the model starts giving you the broad strokes of what you want, then add details to your prompt. Otherwise, you will be trying to de-conflict details and composition at the same time. (View Highlight)
Many stable diffusion models are chock full of photograph data. A great many of these photographs are labeled with formal photography terms that describe the photo. Therefore, learning a base amount of photography terms is a very good idea to instantly gain control over your prompts and images. (View Highlight)
IMPORTANT NOTE: If you put something in the prompt, the model will try to generate it. For example, if you say “close up face shot”, and then put “shoes”, the model will struggle, and wont know how to blend those together, and will likely ignore your close up term. Likewise, if you go into detail describing the subjects clothing, dont be surprised when the model refuses to obey your photography terms, and only does a full shot, because it has all this prompt data telling the model that it needs to draw this specific outfit.
Since the model is trying to generate all the things in your prompt, if your photography term is the easist to ignore, well then guess what happens…it gets ignored. (View Highlight)
Insight: Having a basic knowledge of photography terms can instantly give you more control over your images. However, in my person experience, Stable Diffusion sometimes needs to be coerced into respecting your photography terms in your prompts, so the weight might have to be a bit higher than you expect. If I had to guess, its because the photography term is the easiest for the model to ignore if there are too many conflicting things in the prompt, but that’s my speculation. The shots covered above are only just scratching the surface. If you want to know more, I suggest this blog post from StudioBinder. (View Highlight)
New highlights added November 7, 2023 at 6:32 PM
Textual Inversions (TI) are basically “super terms” that can be put into your prompts. The great thing about TI’s is that they can be shifted around easily, and you can apply weights to them. That last part is huge, and cannot be overstated, so ill repeat it.
You can apply weight values to your textual inversions. (View Highlight)
one of the most popular TI’s is bad-hands-5. Its normally pretty good, but sometimes it just doesn’t quite deliver. Well, you can just crank that baby up like (bad-hands-5:1.5) and all of a sudden you will notice that your images hands are better. (View Highlight)
Now I know what you’re thinking. “Ok so ill just crank up the weights on these TI’s all day and ill generate the best images of my life”. Not to fast. Cranking up the weights on TI’s does come with a cost. It can clamp down on image creativity. If you ratchet up FastNegativeV2 to (FastNegativeV2:2), you will feel the AI start to struggle as you have put it into a tiny little box where everything has to be exact, and that’s not how Stable Diffusion generates its best images. But what if you still wanted FastNegativeV2, but you didnt want it to clamp down so harshly? Well, you can reduce its weight, like (FastNegativeV2:0.8). (View Highlight)
Lets relate TI’s to roadside guard rails. Guard rails are great, but not if they force you to drive in a strictly straight line. You want just enough guard rails so that you don’t go flying off the road, but you can still go where you want.
Here is a list of some excellent TI’s you should consider using: (View Highlight)
FastNegativeV2 - All purpose negative for all types of content. Jack of all trades. Master of None. Good enough for almost everything.
BadDream / UnrealisticDream - Instant quality increases. BadDream is for general quality. UnrealisticDream is more for photo-realism. The quality increase is more subtle than FastNegativeV2. The details are increased.
fcPortrait Suite - The easiest way to get portraits bar none. These TI’s criminally underrated. Also includes fcNegative which is another great negative with a lighter touch than FastNegativeV2 (this is subjective though).
CyberRealistic Negative - A great negative geared towards photo-realism.
bad-hands-5 - Pretty self explanatory. Focuses on making hands look normal.
Insight: Textual Inversions, sometimes called Embeddings, are a very easy way to increase image quality. They offer a wide degree of control especially considering their weight can be adjusted. (View Highlight)
If an image is worth a thousand words, then a LoRA is worth a thousand prompts. LoRA’s are so powerful that there are countless articles written about them. I’m not going to cover those. Instead I’m going to focus on a couple LoRA’s that I find most useful. (View Highlight)
LowRA - Stable Diffusion really likes bright images. Sometimes you don’t want bright images. In those cases, you use LowRA. Here is the base image with LowRA set to 0.7. (View Highlight)
As we can see, simply applying LoRAs can bring useful properties to our images, without any effort trying to describe to the model what you are looking for. But consider what you would type into your prompts if you wanted to achieve these effects? You can use “darker” and “ultradetailed”, etc, but those terms will get cumbersome quickly, and they may not even have the desired effect.
Insight: If you’re not using LoRA’s, you should. They are often the only way to achieve certain effects. They are extremely powerful when used correctly, and can instantly give you the tools you need to craft your image. (View Highlight)
You might have seen ADetailer being mentioned here and there, and wondered what is was. ADetailer is “Restore faces” on steroids. Its a collections of smaller AI’s that perform post processing effects on your images, after Hires Fix. Why is this useful? It fixes faces. It fixes eyes. It fixes hands. In other words, it fixes everything Stable Diffusion normally has trouble with. (View Highlight)
It uses simple inpainting to do this. You can even specify different prompts for each fix.
As an example, you can tell the ADetailer face scan, to color the eyes a certain color, and then you don’t even need to put the eye color in your main prompt, which instantly reduces term collision. (View Highlight)
Once you install ADetailer, you will see it as an extra panel in your settings. The yellow arrow points to the ADetailer model want to scan with, in this case face_yolov8n.pt. This tells ADetailer to scan for faces, and the apply the Positive Prompt (green arrow) and Negative Prompt (red arrow), via inpainting.
Lets change her eyes to blue using ADetailer. (View Highlight)
Remember, there is no mention of blue eyes anywhere in our original prompt, and the original seed image did not have blue eyes. Those blue eyes came purely from ADetailer. But also look at the two faces. ADetailer also touched up the face in the second image as well, as now it looks a little more airbrushed.
Now, I actually think the eyes are a little TOO blue, so lets try to make them more realistic. Lets adjust weight to make them slightly less blue. (View Highlight)
Much better, but I liked the original face if I am honest. Lets change the model from face_yolov8n to mediapipe_face_mesh_eyes_only. This ADetailer model will only scan for eyes, and touch up those, leaving face details intact. Lets try it. But as this isnt the face model anymore, we need to edit our prompt. (View Highlight)
Just like with the face, and eyes, lets specify a positive and negative prompt, but this time we are going to select the hand_yolov8n.pt model. Lets also turn up the detection model confidence threshold, because this model will evaluate a lot of stuff as hands. Also, notice the (bad-hands-5:1.5). Because we are inpainting only the hands, we can really crank up the power on this TI. (View Highlight)
This is almost perfect, with the exceptions that the wrist cuffs, and collar are black, and not white. So lets use the Extra Seed function in A1111 and see if we cant squeeze out some white cuffs, and white collar, while keeping the composition (and prompt) just the way it is.
(View Highlight)
Now, what this will do is generate tiny changes on top of your original seed. Its basically a whole denoiser that sits on top of the original image, and you get to determine how powerful it is, and you control what seed the denoiser uses. What we are doing here, is telling the extra seed function to generate a 5% change on top of the original image. Hopefully this is enough to get some white cuffs, if we just let it run.
Whenever I do this, I always turn off hires fix, and ADetailer, because I am looking for the seed to give me something specific. If I get the white cuffs, I can reuse the extra seed, and then turn on hires fix, and ADetailer.
But first, lets let it work.
…and after 23 attempts, I got one with a white collar. Remember this image has no hires fix or ADetailer, so the quality is lacking, however, you can see that its still very similar to the base image. (View Highlight)
I’m going to go with this image. If I want to touch this image up even more, I’ll have to resort to either more prompt refinement, rolling through more seeds, straight up inpainting, or other more advanced methods.
Insight: Using the Seed + Extra Seed can sometimes help you round off rough edges with the images you generate. If you generate an image that’s almost perfect, try the extra seed function, and see if you cant get the model to generate an image that’s slightly different enough, so that it retains its original luster, but removes the unwanted rough edge. (View Highlight)